Parallel decoding of interleaved data streams within an MPEG decoder

ABSTRACT

An MPEG decoder in a high definition television receiver decodes and decompresses MPEG coded data to produce decompressed image pixel blocks, and includes a motion compensation network coupled to a frame memory to produce finally decoded pixel data for display. The decompressed MPEG data is recompressed by plural parallel recompressors prior to storage in frame memory. Each recompressor receives a datastream of interleaved pixel data, and predicts and compresses interleaved pixel values during each clock cycle, respectively. One of the recompressors is de-energized in a reduced data processing mode when pixel data is subsampled prior to recompression. Subsampled data is re-ordered prior to recompression. Multiple parallel decompressors coupled to the frame memory provide pixel data to the motion processing network. A control unit insures an uninterrupted interleaved data flow to the decompressors by repeating last valid data when source data is interrupted.

FIELD OF THE INVENTION

This invention relates to processing digital image representativeinformation.

BACKGROUND OF THE INVENTION

Rapid advances in digital technology have produced correspondingadvances in digital image signal processing in various fields such ashigh definition television (HDTV). The MPEG (Motion Picture ExpertsGroup) signal compression standard for MPEG-2 video processing (ISO/IECInternational Standard 13818-2, Jan. 20, 1995) is a related development.This widely accepted image processing standard has been found to beparticularly attractive for use with satellite, cable and terrestrialbroadcast systems including HDTV systems.

A digital HDTV terrestrial broadcast system recently adopted as theGrand Alliance HDTV system in the United States defines a standard ofdigital broadcast of high definition (HD) program material which hasbeen data compressed using the MPEG-2 compression standard. Adescription of the Grand Alliance HDTV system is found, for example, inthe 1994 Proceedings of the National Association of Broadcasters, 48thAnnual Broadcast Engineering Conference Proceedings, Mar. 20–24, 1994.The HD broadcast standard provides for image resolution up to 1920pixels per line (horizontally) by 1080 lines (vertically). The MPEG-2standard defines the procedures required to decompress the HD image forreproduction by a display device such as in a television receiver. About80 Mega bits (Mb) of memory is required by an MPEG decoder to properlydecode an HD image as defined in the terrestrial broadcast standard.About 96 Mb of memory would be required in a consumer receiver.

In an MPEG video signal decoder such as may be found in a televisionsignal receiver, more than one image frame of memory is typically neededfor decoding an MPEG coded digital datastream, which represents I, P andB image frames as known. Three frames of memory are generally needed fordecoding an MPEG datastream. Two frames of memory are needed to storereference I or P frame data, and an additional frame of memory is usedto store B frame data.

An MPEG decoder includes a DPCM loop associated with a motioncompensation function for producing finally decoded pixel samples, asknown. As disclosed in copending U.S. patent application Ser. No.08/579,192, the DPCM loop is advantageously modified by incorporating adata compression network. This compression network re-compressesdecompressed MPEG data before being conveyed to a frame memory, therebyreducing the memory requirements of the MPEG decoder. The DPCM loop isarranged so that the value of a pixel to be compressed is dependent onthe results of a predictor circuit evaluating pixels to the immediateleft, directly above, and diagonally to the upper left of the pixelbeing processed. The predictor operation is a real-time, computationallyintensive serial operation. The predictor operation is important sincemore than one pixel value is involved, and because good compressionrequires accurate prediction rather than a “guess” at a pixel value.

SUMMARY OF THE INVENTION

In accordance with the principles of the present invention a digitalimage signal processor, such as an MPEG compatible decoder for example,processes multiple datastreams comprising a predetermined sequence ofinterleaved image data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a portion of a television signal receiveremploying an MPEG decoder and associated digital signal processingnetworks arranged in accordance with the principles of the presentinvention.

FIGS. 2–17 depict pixel block processing formats helpful inunderstanding the operation of compression/decompression and associatednetworks shown in FIG. 1.

FIG. 18 depicts pixel subsampling and upsampling.

FIG. 19 is a block diagram of apparatus for performing the processdepicted in FIG. 18.

FIG. 20 shows details of a compression network of FIG. 1.

FIG. 21 shows details of a decompression network of FIG. 1.

FIG. 22 depicts a pixel arrangement helpful in understanding aspects ofthe operation of the network shown in FIG. 20.

FIGS. 23–26 illustrate a data flow control operation for the system ofFIG. 1.

FIG. 27 is a table depicting pixel relationships during the operation ofthe network shown in FIG. 20.

FIG. 28 depicts an alternative arrangement of the network shown in FIG.23.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In a disclosed embodiment of the invention, an MPEG decoder in atelevision receiver employs data reduction, including re-compression,between the decoder and the decoder frame memory from which imageinformation to be displayed is derived. The system uses pipelineprocessing in consideration of predictor processor timing requirements,wherein three pixel (picture elements) values must be made available topredict the value of a given fourth pixel. Pipeline processing slowsprocessing (reduces bandwidth), however. This matter is resolved byinterleaving pixel data from independent 8×8 pixel blocks supplied fromthe MPEG decompressor. Interleaving increases processing speed since itallows pixel data to be processed on alternate clocks, so that acompressed pixel value is always being generated. The re-compressionfunction uses a reduced number of compression operations and exhibitsinterleaved operation with shared functions to conserve integratedcircuit area.

FIG. 1 depicts a portion of a digital video signal processor such as maybe found in a television receiver for processing an input highdefinition video signal. The video processor includes functions found ina conventional MPEG decoder. An MPEG encoder and decoder are described,for example, by Ang et al. in “Video Compression Makes Big Gains,” IEEESpectrum, October 1991. These functions typically include inputbuffering, variable length decoding, inverse quantization, and inverseDCT transformation prior to associated motion compensation processingwhich produces finally decoded output samples. Additional informationconcerning these and related video signal processing functions is foundin Weiss, Issues in Advanced Television Technology (Focal Press, Boston,USA).

The system of FIG. 1 receives a controlled datastream of MPEG codedcompressed data from a source represented by unit 10 including atransport decoder which separates data packets after input signaldemodulation. In this example the received input datastream representshigh definition image material (1920 pixels/horizontal line×1088horizontal lines) as specified in the Grand Alliance specification forthe United States high definition terrestrial television broadcastsystem. The data rate of the 1920×1088 high definition information is94,003,200 bytes/sec, determined as follows:(1920H×1088V×30F×(8+4)YC)/B where

-   -   H represents horizontal pixels,    -   V represents vertical lines,    -   F represents frames/sec,    -   YC represents (luminance+chrominance) bits, and    -   B represents 8 bits/byte.        In practice, the compressed MPEG datastream is provided via        internal memory bus 55 and a compressed data interface included        in unit 128, which receives data from control bus 114 under        control of microprocessor 120. Microprocessor 120 receives the        MPEG datastream via a compressed data input.

The input datastream from source 10 is in the form of data blocksrepresenting 8×8 pixels. This data represents compressed, codedintraframe and interframe information. The intraframe informationcomprises I-frame anchor frames. The interframe information comprisespredictive motion coded residual image information representing theimage difference between adjacent picture frames. The interframe motioncoding involves generating motion vectors that represent the offsetbetween a current block being processed and a block in a priorreconstructed image. The motion vector which represents the best matchbetween the current and prior blocks is coded and transmitted. Also, thedifference (residual) between each motion compensated 8×8 block and theprior reconstructed block is DCT transformed, quantized and variablelength coded before being transmitted. This motion compensated codingprocess is described in greater detail in various publications includingthe Weiss text and the Ang, et al. article mentioned above.

The MPEG decoder exhibits a reduced memory requirement which allows asignificant reduction in the amount of external frame memory. As will beexplained subsequently, this is accomplished by re-compressingdecompressed video frames to be stored in memory, and by selectivelyhorizontally filtering and decimating (i.e., subsampling ordownsampling) pixel data within the decoder loop depending on theoperating mode of the decoder. For example, in one mode the systemprovides anchor frame compression. In another mode the system providescompression after horizontal detail reduction by low pass filtering anddownsampling.

The input compressed pixel data blocks are buffered by unit 12 beforebeing variable length decoded by unit 14, which also produces motionvectors MV as known. Buffer 12 exhibits a storage capacity of 1.75 Mbitsin the case of a main level, main profile MPEG datastream. Decodedcompressed data blocks are output from unit 14 via a multiplexer (Mux)15, which produces output datastreams P1 and P2. Outputs P1 and P2represent dual data pipelines hereinafter referred to as pipe 1 (P1) andpipe 2 (P2). Pipe P1 contains a group of DCT coefficients for an 8×8pixel block “A” of a given macroblock, followed by a group of DCTcoefficients for an 8×8 pixel block “C” for that macroblock. The DCTcoefficients are arranged in a diagonal or “zig-zag” scan format, asknown. Pipe 1 conveys a sequence of such A, C blocks for a sequence ofcorresponding macroblocks. Pipe 2 similarly contains a group of DCTcoefficients “B” and “D” for the given macroblock and for macroblockssequenced therewith. The arrangement of pixel data for such pixel blocksand macroblocks in pipelined sequence is shown and will be discussed inconnection with FIGS. 2–17.

The pixel block data are conveyed by the respective pipes in paralleldata processing paths each including an inverse quantizer (18, 20), aninverse Discrete Cosine Transform (DCT) unit (22, 21), output FIFObuffers (26, 28), block re-ordering units (23, 25) block interleavingunits (24, 27) and adders (30, 32). Decompression and transform decodingare respectively performed by the inverse quantization units and by theinverse DCT units in each pipeline before being applied to one input ofadders 30 and 32 respectively.

Reordering units 23, 25 remove the zig-zag scan pattern of the inverselyDCT transformed pixel data from units 21 and 22 to produce a horizontalline-by-line pixel scan sequence for each 8×8 block. Thus, in pipe 1 forexample, the output of unit 23 represents pixel values of the form a1 a2a3 . . . . a63 a64 (for block A), c1 c2 c3 . . . . c63 c64 (for blockC), etc. Interleaving unit 24 uses a multiplexing technique to producepipe 1 output data of the form a1 c1 a2 c2 a3 c3 . . . a64 c64.Interleaver 27 produces a similar sequence for blocks B, D.

The quantization step size of inverse quantizers 18 and 20 is controlledby a Quant Control signal from buffer 12 to assure a smooth data flow.Decoded motion vectors MV are provided from decoder 14 to a motioncompensation unit 90 as will be discussed below. Decoder 14 alsoproduces an inter/intra frame mode select control signal, as known,which is not shown to simplify the drawing. The operations performed byunits 14, 18/20, 21/22 and 23/25 are the inverse of correspondingoperations performed by an MPEG encoder at a transmitter. The MPEGdecoder of FIG. 1 reconstitutes the received image using MPEG processingtechniques which are described briefly below.

Reconstructed pixel blocks are respectively provided at the outputs ofadders 30 and 32 by summing the residual image data from units 26 and 28with predicted image data provided at the outputs of motion compensationunit 90 based on the contents of video frame memory 20. An entire frameof reconstructed image representative pixel blocks is stored in framememory 60. In the interframe mode, motion vectors MV obtained fromdecoder 14 are used to provide the location of the predicted blocks fromunit 90. The motion compensation process forms predictions frompreviously decoded pictures which are combined with the coefficient data(from the outputs of IDCT units 21 and 22) in order to recover thefinally decoded samples. Motion compensation unit 90 operates inaccordance with known MPEG compatible techniques as discussed, forexample, in the MPEG specification and in the Weiss and Ang referencesmentioned previously. The A, C and B, D outputs of unit 90 representdecompressed interleaved pixel block data A, C and interleaved pixelblock data B, D as will be discussed.

The image reconstruction process involving adders 30, 32, externaldecoder frame memory 60 and motion compensation unit 90 advantageouslyexhibits significantly reduced frame memory requirements due to the useof block-based parallel data compressors 40 and 42, and horizontal pixeldecimation (subsampling) units 36 and 38 which reduce horizontal detail.The size of frame memory 60 may be reduced by 25%, 50% or more as afunction of the data reduction achieved by recompression units 40, 42and decimation by units 36, 38. Output data from decimation units 36 and38 is processed by a block re-ordering unit 43 before being conveyed tocompressor 40 in a reduced data operating mode when horizontaldecimation is employed, as will be discussed. The effect of there-ordering operation will be seen in connection with FIGS. 12 and 14and related Figures. A Mode Control signal and a mode switch 45 modifythe compressor operation in a reduced data operating mode whenhorizontal decimation units 36 and 38 are activated, as will beexplained. Compressor 42 is disabled (e.g., de-energized) in the reduceddata mode. At other times, e.g., when processing a high definition inputsignal, both compressors 40 and 42 are active.

Decompression units 80–84 perform the inverse of the compressionperformed by units 40 and 42. Unit 88 performs the inverse of thedecimation performed by units 36 and 38. Additional details of theseoperations will be discussed subsequently. Formatting unit 86 discardsunwanted decompressed lines of pixels until lines containing thedecompressed pixels needed for motion compensation predictor processingby unit 90 are acquired. This selection process prevents unnecessarydata from accumulating, and is advantageously used in a compressionsystem (such as the disclosed system) which does not provide uniquemapping in memory for every pixel. In this regard it is noted that apixel value may have been compressed or quantized with 3, 4 or 5 bitsfor example, and the value of the pixel is not known until afterdecompression.

Processing an MPEG decoded input high definition datastream isadvantageously accomplished by interleaving the input datastream priorto re-compression, and by processing the interleaved data using aninterleaved data compression network. The data re-compression networkincludes similar compressors 40 and 42 which operate on an interleaveddatastream. These compressors share certain logic circuits and look-uptables contained in unit 44, and operate in response to a locallygenerated 54 MHz (2×27 MHz) clock signal CLK produced by clock generator50. The CLK signal is also applied to horizontal upsampling network 88.An 81 MHz (3×27 MHz) clock signal also produced by generator 50 isapplied to decompressors 62, 80, 82 and 84, and to display processor 70.

Before proceeding with a description of the system operation, it will behelpful to understand the nature of the interleaved pixel dataprocessing as illustrated by FIGS. 2–17. FIG. 2 illustrates a known MPEGmacroblock configuration comprising luminance (Y) and chrominance (U, V)pixel block components. The luminance component of each macroblock isconstituted by four 8×8 pixel luminance blocks Ya, Yb, Yc, and Yd. Thechrominance component comprises 4×4 pixel “U” blocks Ua–Ud, and 4×4pixel “V” blocks Va–Vd as shown. Interleavers 24 and 27 (FIG. 1)interleave these pixel blocks in data pipes P1 and P2 respectively asdiscussed previously and as shown in FIG. 3, which illustrates how theluminance and chrominance blocks are arranged for A,C and B,D pipelineprocessing. The pipelining process before interleaving is illustrated ingreater detail in FIG. 4 with respect to the 4×4 pixel blocks whichconstitute a “U” chrominance component. FIG. 4 shows the result of theprocess by which units 23 and 25 place chrominance pixel blocks Ua andUc in data pipe 1, and pixel blocks Ub and Ud in pipe 2. In the diagram,A1 represents the first pixel value (8 bit) of block A, A2 representsthe second pixel value (8 bit) of block A, B1 represents the first 8 bitpixel value of block B, and so on through final values A16 and B16 toblock D. Analogous observations pertain to the luminance pixel data.

FIGS. 5–7 illustrate pixel data arrangements assuming horizontaldecimation is not performed by units 36 and 38 in FIG. 1. In suchoperating mode, 36 and 38 are bypassed depending on the amount of datareduction (decimation plus compression) desired for a given systemconfiguration. Pixel data processing with horizontal decimation enabledis illustrated by FIGS. 11–17.

FIG. 5 depicts the A, C sequence of interleaved pixel data in pipe 1conveyed from the output of interleaver 24 to compressor 40 from buffer26 and adder 30 without decimation by unit 36. Similarly, pixel data inpipe 2 are conveyed to compressor 42 from interleaver 27 and adder 32 inthe sequence B1, D1, B2, D2, . . . etc. Partitioning of the macroblockinto sections represented by A, B, C, and D data groups is not critical.For example, in another system pipe P1 could convey A, B data or A, Ddata. Similarly, pipe 2 could convey a data combination other than B, D.In the illustrated embodiment the A, C data conveyed by pipe 1corresponds to “even” data blocks in accordance with the MPEGspecification, and pipe 2 B, D data corresponds to “odd” data blocks inthe MPEG specification.

FIG. 6 illustrates the compressed pixel data output from firstcompressor 40 in the first pipe after Huffman coding. Each “x” in theFIG. 6 datastream represents a “don't care” condition produced tosimplify the clocking process, whereby a continuous clock (rather than aless desirable stop/start clock) encompassing 8 bits of data for eachclock cycle is used. A Write Enable signal (not shown) assures that onlyvalid compressed data are written to memory when present. For everysixteen 8-bit (chroma) pixels (8 bytes) at the input, 16 bytes ofcompressed data are produced at the output. Not shown is the analogouspixel data output from second compressor 42 for blocks B, D in pipe 2.Details of a compression circuit suitable for use in compressors 40 and42 will be shown and discussed with respect to FIG. 20.

After compression by units 40 and 42, the pixel data are conveyed via a128-bit wide (i.e., 128 parallel data lines each conveying one bit)internal memory bus 55 (FIG. 1) and a 64-bit wide external memory bus 57to external decoder frame memory 60. Memory 60 stores the pixel blockdata in de-interleaved form. De-interleaving may be performed by outputcircuits associated with compressors 40 and 42, or by circuits prior tomemory 60, under control of a local microprocessor 120. These circuitsuse known signal processing techniques to perform the inverseinterleaving function and have not been shown to simplify the drawing.FIG. 7 shows the form of the compressed pixel data sent to memory 60after de-interleaving. Each compressed pixel is represented by 3 to 6bits of data. In the block of compressed A data, “a1′” does notrepresent pixel a1 at this point but rather 8 bits constituted by acombination of compressed pixels and overhead data. The data length of apixel is determined by the data itself and by the location of the pixel.The number of bits used to compress the data in this chroma block is 64bits. The original chroma data was constituted by 128 bits (8×16 bits).Similar observations apply to the “B” through “D” data.

Referring back to FIG. 1, compressed pixel data stored in memory 60 areprocessed for display by means of a display processing network includinga display decompressor 62, FIFO display buffer 64, multiplexer 68, anddisplay processor 70. Display buffer 64 holds sixteen image lines,divided between a pair of eight-line buffers. Decompressed data fordisplay processing is read from one of the line buffers via multiplexer68 while the other line buffer is being filled with decompressed datafrom unit 62. Buffers 64 may be located in memory unit 60. Displayprocessor 70 may include, for example, an NTSC coding network, circuitsfor conditioning the pixels for display, and a display driver networkfor providing video signals to image reproducing device 72, e.g., a highdefinition kinescope or other appropriate display means.

Prior to decompression by unit 62, the pixel data are re-interleaved toexhibit an “ab” block sequence as illustrated by FIG. 8. Thisinterleaving may be performed by suitably addressing the read outoperation of memory 60, or by input logic circuits associated withdisplay decompressor 62. Similarly, pixels c and d are re-interleaved toproduce a “cd” data sequence (not shown) prior to decompression. There-interleaving sequences for display, i.e., ab and cd, differ from theoriginal input interleaving sequences (ac and bd). The originalinterleaving permitted pixel data a and b, for example, to be accessedfirst and data a and b were processed in parallel. The re-interleaveddisplay sequence is appropriate for display purposes where data from thesame image frame is needed (pixels a,b and c,d are in the same imageframe). The sequence of interleaved decompressed pixel data for the “ab”sequence is shown in FIG. 9. A similar sequence of interleaveddecompressed pixel data for the “cd” sequence (C1, D1, C2, D2, C3, D3 .. . ) not shown is also produced. After processing by units 64, 68 and70 the pixels of a given block are rearranged to a display format asshown in FIG. 10. This is a simplified example in 4:2:0 form rather than4:2:2 form.

Referring to FIG. 1 again, the MPEG decoder loop also includes adecompression function performed by a plurality of decompressors 80, 82and 84 in association with data formatting and horizontal up-samplingperformed by units 86 and 88 respectively. The comments above concerningFIGS. 8 and 9 also apply to this control loop decompression function,wherein prior to decompression the pixel data are interleaved bycircuits associated with the decompressors to exhibit an “ab” (and “cd”)data sequence as illustrated by FIG. 8.

FIGS. 11–17 illustrate pixel data sequence arrangements assuminghorizontal decimation (i.e., subsampling or downsampling) by units 36and 38 of FIG. 1 has been enabled. When data reduction in the form ofhorizontal decimation by units 36 and 38 is enabled, compressor 42 isdisabled and only compressor 40 is used to compress data because of thereduced amount of data. Network 44 contains logic circuits and Look-UpTables used by units 40 and 42. These circuits and tables are used byonly one of the compressors when the other is deactivated in areduced-data operating mode when data processing demands are less. In ahigh resolution mode when both compressors 40 and 42 operate, sharingthese circuits and tables is facilitated by the interleaved datapipelining process. Specifically, unit 44 contains two Look-Up tables,one for use by compressor 40 and one for use by compressor 42. The LUTfor compressor 40 is shared for compressing interleaved A and C datasince these data are compressed at different times, such as on alternateclocks as will be discussed. The LUT for compressor 42 is similarlyshared during compression of data B and D.

FIG. 11 depicts the sequence of pixel data applied from data pipe 1 tothe input of decimation filter 36 in FIG. 1. Decimation by filter 36produces the pixel data sequence of FIG. 12, which is applied to theinput of reordering network 43. In FIG. 12 the “x”-labeled elementsrepresent “don't care” or null data. In an H/2 mode whereby horizontalpixel data is subsampled by a factor of 2, filter 36 averages twoadjacent pixels so thata1*=(A1+A2)/2,c1*=(C1+C2)/2,a2*=(A3+A4)/2,and so on. This process is illustrated in FIG. 18 as will be discussed.Decimation using other subsampling factors may also be used. FIGS. 13and 14 similarly depict the sequence of pixel data applied from datapipe 2 to decimation filter 38 in FIG. 1.

FIG. 15 shows the sequence of pixel data after decimation and reorderingby unit 43 in FIG. 1. Specifically, the pixel data have been re-alignedby unit 43 to place them in a proper sequence for compression andstorage in memory 60. In FIG. 15, pixel data a1 through c4 represent oneimage frame (a 2×4 matrix) after decimation, pixel data b1 through d4represent a second image frame (2×4 matrix) after decimation, and soforth.

In FIG. 1, all the reordered pixel data from unit 43 are applied tocompressor 40 via mode switch 45, since only one compressor is needed tocompress the reduced amount of data resulting after decimation.Averaging data in the decimation process produces one pixel from twopixels, resulting in less data and a corresponding reduced need for dataprocessing bandwidth. Consequently only one compressor is sufficient,and compressor 42 is inactivated. The compressors are enabled anddisabled as required in the absence or presence of decimation inresponse to a Mode Control signal as will be discussed.

The reordering which occurs in reorder network 43 is not astraightforward procedure such as may occur with a first-in, first-outbuffer. To minimize the complexity of motion compensation loopprocessing including horizontal decimation, reordering andrecompression, the data is presented to compressor 40 in substantiallythe same format as data which has not been decimated horizontally byunits 36 and 38. Providing reorder network 43 separate from recompressor40 simplifies the circuit, because recompressor 40 does not have todistinguish between data requiring reordering from units 36 and 38, anddata not requiring reordering from adders 30 and 32.

FIG. 15A is a compilation of FIGS. 11 to 15, and illustrates the dataflow through horizontal decimation and reordering relative to pixeltiming. Datastreams 15-1 and 15-2 of FIG. 15A respectively representdata out of adders 30, 32 and into decimator networks 36, 38.Datastreams 15-3 and 15-4 respectively represent data out of decimatornetworks 36, 38 which are input into reorder network 43. As discussedpreviously, pixels are interleaved as is seen in datastreams 15-1through 15-4. Datastreams 15-1 and 15-3 represent data from pipeline P1,and datastreams 15-2 and 15-4 represent data from pipeline P2.Datastream 15-5 represents data out of the reorder network 43, which isinput to recompressor 40 via switch 45. At the bottom of FIG. 15A is apixel clock CLK provided to demonstrate the timing of data pixels asthey pass through the system. As an exemplary illustration, selecteddata pixels will be followed through the reordering processes. Theprocess is the same for data from either pipeline. The pixels ofdatastreams 15-1 and 15-2 represent a chrominance pixel macroblock. Theprocess is the same for luminance pixel macroblocks, but the process ismore complex because the rendering is spread over four 8×8 pixel blocksinstead of four 4×4 pixel blocks. The larger macroblock causes thereordering to occur over a larger number of clock cycles with four timesas much data. However, the reordering principles remain the same forboth luminance and chrominance data.

Pixel B1 from datastream 15-2 is decimated to fifty percent and combinedwith pixel B2 to form an output data pixel of the same size of one inputdata pixel. The same occurs for pixels D1 and D2. Decimator network 38buffers decimated data from pixels B1 and D1 until pixels B2 and D2 areprocessed. This is the reason output data from decimation network 38 isinvalid during the first two clock cycles. Valid data occurs during thethird clock cycle as pixel b1*. Data from pixel B1 is output during thefirst half of the third clock cycle, and data from pixel B2 is outputduring the second half of the third clock cycle. The fourth clock cycleproduces pixel d1* in the same manner.

Data output from pipelines P1 and P2 passes to reorder network 43, whichbuffers the data and accesses particular pixels in the proper order toform a continuous data flow into compressor 40. As is seen fromdatastreams 15-4 and 15-5 of FIG. 15A, pixels b1*, b2*, b3* and b4* mustbe interleaved with pixels d1*, d2*, d3* and d4*, but aftercorresponding a and c pixels. Therefore the pixels reside within reordernetwork 43 for unequal times waiting to be output. For example, pixelb1* is received by reorder network during clock cycle 3 and outputduring clock cycle 12, whereas pixel b2* is received by reorder networkduring clock cycle 7 and output during clock cycle 14. Pixels aredirected in reorder network 43 by a state machine controlled bymicroprocessor 120.

To maintain constant data flow, compressor 40 expects input pixel datain the interleaved format as shown in datastreams 15-1 and 15-2. Afterdecimation networks 36 and 38, the pixel order is changed because thetwo pipelines P1 and P2 are downsampled by an order of 2 for each tosupply half of the data in datastream 15-5. However, the downsampleddata from P1 and P2 originate from vertically adjacent blocks of theimage. Compressor 40 expects pixel data interleaved from horizontallyadjacent blocks. Therefore, reorder network 43 combines the downsampleddata from the order shown in datastreams 15-3 and 15-4 to the ordershown in datastream 15-5. This order is substantially the same as theinterleaved data not subject to downsampling in the decimation networks.Pixel blocks from both downsampled data and data not downsampled are thesame size, that is they have the same number of pixels both horizontallyand vertically. The only difference is that the downsampled pixel datablocks include pixel information from two horizontally adjacent pixelblocks, as previously described. This difference is transparent tocompressor 40, which allows continuous data flow. Whereas this systemreorders to combine horizontally adjacent pixel blocks into adownsampled pixel block, the spirit of the invention also encompasses asystem which would combine pixel blocks having a different spatialrelationship.

As is seen in FIG. 15A, reorder network 43 appears to need pixels a2* toa4* and a6* to a8* from decimator network 36 (datastream 15-3) foroutput (datastream 15-5) before they are available. Realistically, thiscan not and does not occur, but is shown to illustrate the differenttiming and delays which reorder network 43 must accommodate. To preventdata from being needed for output before being received by reordernetwork 43, unit 43 holds and delays sufficient data until all data maybe processed, thereby providing a continuous data output as shown indatastream 15-5. The delay occurs with the first data to flow throughpipelines P1 and P2 and reach reorder network 43, such as occurs when atelevision receiver is initially energized, when a channel is changed,or at any time data synchronization is established. After an initialdelay, data is continuous without losing clock cycles.

FIG. 16 depicts the sequence of compressed pixel data from the output ofcompressor 40. In FIG. 16, compressed data “m” designate compressedpixel data constituted by data derived from pixels a and b afterdecimation (i.e., a 4×8 pixel block produced after decimating an 8×8pixel block). Similarly, compressed data “n” designate compressed dataconstituted by data derived from pixels c and d after decimation. Pixeldata a and b are in the same image field, and pixel data c and d are inthe same image field. The pixel block compression process performed bycompressor 40 is designed to operate with respect to 8×8 pixel blocks.After decimation, a resulting 4×8 pixel “a” block and a 4×8 pixel “b”block are combined to produce an 8×8 pixel block which is compressed toproduce block “m.” Analogous observations pertain to the formation ofcompressed blocks “n” from decimated 4×8 blocks “c” and “d.” In thismanner blocks in the same image frame are properly aligned for efficientMPEG decoding. FIG. 17 depicts the arrangement of the properly framesequenced compressed blocks as conveyed to and stored by memory 60.

The horizontal detail reduction produced by the decimation networkfurther reduces decoder memory requirements by reducing the number ofpixel values that are stored in memory 60. Decimation network 36, 38employs a horizontal spatial low pass filter followed by 2:1 horizontaldecimation (downsampling) before providing data to memory 60. Afterdecompression by units 80, 82 and 84, the resolution of imageinformation from memory 60 is reconstituted by unit 88 using a pixelrepeat up-sampling process. The up-sampling process is not requiredbetween display decompressor 62 and display processor 70 since processor70 provides the required horizontal sample rate conversion. It isexpected that display ecompressor 62 and processor 70 will not performupsampling in a reduced cost receiver because of the reduced displayresolution provided by such a receiver. In such case memory reduceddecoded frames have higher resolution than a standard definitiondisplay. For example, to decode and display a 1920×1088 pixel videosequence on a 720×480 pixel display device requires that images storedin frame memory have a resolution of 960×1088 (with horizontaldecimation by a factor of two). Thus display decompressor 62 does notneed to upsample images, but display processor 70 will have todownsample the 960×1088 resolution image to 720×480 to be suitable fordisplay.

FIGS. 18 and 19 respectively illustrate the general arrangement ofelements associated with the pixel subsampling process as performed byunits 36, 38 in FIG. 1, and pixel upsampling as performed by unit 88. Inunits 36 and 38 the original pixels are first low pass filtered by aneven order low pass filter 102 before being decimated by two, wherebyevery other pixel value is removed by unit 104. These pixels are storedin memory 60. Afterwards, pixel data from memory 60 are repeated byelement 106 of upsampling unit 88 using well known techniques.

In this example filter 102 is an 8-tap symmetrical FIR filter. Thisfilter operates in the horizontal spatial domain and filters acrossblock boundaries. The 8-tap filter has the effect of shifting therelative position of the output pixels by one-half sample periodrelative to the input, as shown in FIG. 18. As also shown in FIG. 18,the pixel repeat up-sampling has the effect of maintaining the samespatial position of the downsampled/upsampled pixels relative to theoriginal pixels. Decimation filter unit 104 may be a two-tap filter sothat for input pixels x and y the filter output is (x+y)/2, anddecimation is accomplished by dropping every other pixel. This filterdoes not cross the block boundary, is easy to implement, and is a goodchoice for horizontal decimation.

The television receiver system shown in FIG. 1 has been simplified so asnot to burden the drawing with excessive detail. For example, not shownare FIFO input and output buffers associated with various elements ofthe system, read/write controls, clock generator circuits, and controlsignals for interfacing to external memories which can be of theextended data out type (EDO) or synchronous type (SDRAM). The system ofFIG. 1 additionally includes a microprocessor 120 for sending andreceiving data, read/write enable and address information for example,bus interface 122 and controller 126 coupled to an internal control bus114. In this example microprocessor 120 is located external to theintegrated circuit containing the MPEG decoder.

Display processor 70 includes horizontal and vertical resampling filtersas needed to convert a decoded image format to a predetermined formatfor display by unit 72. For example, the system may receive and decodeimage sequences corresponding to formats such as 525 line interlaced,1125 line interlaced, or 720 line progressive scan. Processor 70 alsoproduces clocks and H, V, sync signal associated with the image display,and communicates with frame memory 60 via internal memory bus 55.

External bus interface network 122 conveys control and configuringinformation between the MPEG decoder and external processor 120, inaddition to input compressed video data for processing by the MPEGdecoder. The MPEG decoder system resembles a co-processor formicroprocessor 120, e.g., microprocessor 120 issues a decode command tothe MPEG decoder for each frame to be decoded. The decoder locates theassociated header information, which in turn is read by microprocessor120. With this information microprocessor 120 issues data forconfiguring the decoder, e.g., with respect to frame type, quantizationmatrices, etc., after which the decoder issues appropriate decodecommands. Variable length decoder 14 communicates via memory bus 55, andinterfacing circuits 128 facilitate communication between memory bus 55and control bus 114.

Mode control data, programmed by the receiver manufacturer, is conveyedby microprocessor 120 in association with memory controller 134 andcontroller 126 for establishing the compression/decompression factorsfor units 40, 42 and 80–84, and for controlling the status of thecompression and decompression networks and the upsampling anddownsampling networks as required by system design parameters.Microprocessor 120 also partitions memory 60 into frame storagesections, frame storage and bit buffers, and on-screen display bit mapsections for MPEG decoding and display processing. Local memory controlunit 134 receives Horizontal and Vertical Sync (e.g., from unit 70) anddata Request inputs, and provides Acknowledge outputs as well as memoryAddress, Read Enable (R_(en)), and Write Enable (W_(en)) outputs tovarious system circuits including buffer control circuits. Unit 134generates real time address and control signals for controlling memory60. Output clock signals CLK_(out) are provided in response to inputclock in signal CLK_(in), which may be provided by a local clockgenerator such as unit 50. The system of FIG. 1 can be used with allProfiles and Levels of the MPEG specification in the context of variousdigital data processing schemes such as may be associated withterrestrial broadcast, cable, and satellite transmission systems, forexample.

In this embodiment video frame memory 60 is located external to anintegrated circuit which includes the MPEG decoder and associatedelements in FIG. 1. Display processor 70 may include some elements whichare not on the MPEG decoder integrated circuit. The use of such anexternal memory device allows the manufacturer of the receiver to selecta memory device which is economically sized so as to be compatible withthe intended use of the receiver, e.g., for full high definition displayor reduced definition display, when the receiver receives a highdefinition datastream. The large amount of memory normally used for MPEGdecoding presently requires that the memory be external to the decoderintegrated circuit as a practical matter. Future advances in technologymay permit the memory to be located on the same integrated circuit asthe MPEG decoder elements. However, the use of an external memory devicegives a manufacturer the freedom to choose a memory size consistent withthe display resolution and other features of the receiver.

In practice, a receiver manufacturer will decide whether to configure areceiver as an expensive premium model with extended features, or as amore economical model with fewer features. One of the features ofinterest is the resolution of a displayed image. In a reduced costreceiver, factors which contribute to cost reduction include a lessexpensive reduced resolution image display device, and the amount ofmemory associated with the MPEG decoder.

In this example the memory requirement drops to 64 Mbits when thecompressor network compresses data 25%, and the memory requirement dropsto an even more economical 48 Mbits when data is compressed 50%. The 25%compression factor would be associated with a full HD image display andwould be virtually indistinguishable from full MPEG decoding withoutcompression. With 50% compression a trained observer may be able to findbarely noticeable artifacts. In either case the decoded image sequencewould exhibit full 1920×1088 HD resolution for display by a full HDresolution image display device.

Full HD image resolution is not required in some cases, such as when areceiver model uses an inexpensive display device with less than full HDresolution capability. In such case it is desirable to receive anddecode HD information without displaying full HD resolution images. Insuch a receiver decimator network 36, 38 and compressor network 40 canbe used together to significantly reduce the decoder memoryrequirements. For example, the decimator network may horizontallydecimate data by a factor of 2, and the compressor network may compressthe decimated data by 50%. This results in a greatly reduced decodermemory requirement of 32 Mbits. In this case an image for displayexhibits 960×1088 resolution, which is sufficient for either 1H or 2Hreceiver applications. Thus a low cost receiver capable of decoding fullHD image datastreams can be constructed using only 32 Mbits of MPEGdecoder memory. The operation described above is accomplished inresponse to the Mode Control signal provided to switch 45 bymicroprocessor 120. Depending on whether the MPEG decoder is situated ina high definition receiver or a receiver with reduced resolution,microprocessor 120 is programmed to determine the amount of compressionand whether or not the decimator network is enabled to downsample data,or is bypassed.

The system of FIG. 1 exhibits a first data processing mode forprocessing a signal containing a large amount of data such as a highdefinition television signal for display by a high definition displaydevice, and a second mode for processing a reduced amount of data. Thesecond mode may be employed, for example, in an economical receiverincluding a reduced cost display device with less data resolutioncapability (i.e., a non-HDTV display device).

The state of switch 45 in FIG. 1 is controlled by the Mode Controlsignal which may be programmed by a receiver manufacturer to indicatethe type of data to be displayed by the receiver, e.g., high definition(first mode) or less than high definition (second mode). Switch 45 wouldbe controlled to produce first mode operation if a received highdefinition signal is to be displayed by a high definition displaydevice, and second mode operation in the case of a high definition inputsignal being subsampled by units 36 and 38 to produce less than highdefinition image information for reproduction by a more economicaldisplay device having less than high definition resolution.

In the first mode, decimator units 36 and 38 are bypassed and datablocks to be compressed are conveyed directly to compressor 42, and tocompressor 40 via switch 45. In this mode the Mode Control signal isapplied to a control input of compressor 42 for enabling compressor 42.In the second mode, the state of the Mode Control signal disablescompressor 42 by removing power from compressor 42 in this embodiment,while enabling the data from adder 30 to be conveyed to activecompressor 40 via switch 45. Disabling compressor 42 by removing poweris particularly advantageous in an integrated circuit device intended toprocess HDTV information, because of the power (heat dissipating)limitations of such integrated circuits due to high clock frequencies,large surface area, and the large number of active elements integratedthereon. In a gated clock system, removing power can effectively beaccomplished by stopping the compressor clock. An additional advantageof such operation is that the compressor need only operate in similarblock processing modes such as 8×8 and 4×8. That is, compressor 40 forexample need not be re-programmed to process 4×8 pixel blocks asproduced by the decimation process. Block reordering unit 43 rebuildsblocks after decimation to produce, from 4×8 pixel blocks, an 8×8 pixelblock compatible with the compressor block processing algorithm.

The Mode Control signal is also applied to a control input of horizontalupsampling network 88 for bypassing the upsampling function in operatingmodes when decimation by units 36 and 38 is not employed. For thispurpose unit 88 may employ a relatively simple switching arrangement forswitching the output signal from unit 86 directly to unit 90 in suchbypass mode.

Compression prior to storing data in memory 60 requires that data bedecompressed prior to unit 90 in the motion compensation processingloop. This is accomplished by block-based decompressors 80, 82 and 84,which exhibit the inverse of the operation of compressors 40 and 42.Block-based display decompressor 62 uses a decompression techniquesimilar to that used by compressors 80–84, and decompresses stored pixeldata before being conveyed to display processor 70. When downsamplingnetwork 36, 38 is enabled prior to memory 60, output data from memory 60is upsampled prior to unit 90 in the motion compensation processing loopby unit 88, which exhibits the inverse of the operation of network 36,38.

The system of FIG. 1 advantageously employs a plurality of parallelblock decompressors represented by units 80, 82 and 84 in the motioncompensation loop. Nine decompressors are used in this example, three ineach of units 80, 82 and 84, to allow all pixels to be decompressedindividually. Each of these decompressors has an associated FIFO inputbuffer. Three decompressors (e.g., in unit 80) are used to decompressluminance pixel data in an MPEG forward prediction mode, and threedecompressors (e.g., in unit 82) are used to decompress luminance pixeldata in an MPEG backward prediction mode. Since chrominance informationis half that of luminance, only three decompressors (e.g., in unit 84)are used to decompress chrominance pixel data. The use of all ninedecompressors is needed for worst case MPEG B-picture decoding, whichrequires bi-directional motion compensation predictive processing. ThusB-picture prediction requires two image frames (forward and backward),while MPEG P-picture prediction requires only one image frame.

The motion compensation predictor block may not (and often does not)occur on a block boundary. Instead, several blocks may have to be calledfrom frame memory 60. In a worst case situation in an MPEG-2 system withone-half pixel resolution, the motion compensation predictor block mayoverlap six blocks. Thus six blocks must be accessed from memory. In asystem such as the disclosed system with recompression in the motioncompensation loop (via units 40, 42), pixels cannot be accesseddirectly. All the block pixels must first be decompressed, whichrequires much overhead in the six block worst case situation andproduces much more data than is needed. Unneeded pixel information isdiscarded by formatting unit 86 as mentioned previously, but only afterall pixels have been decompressed.

In large data processing situations such as the six-block situationmentioned above, decompression before storage greatly increases thebuffer memory size requirements associated with handling thedecompressed pixel information. Instead, it has been found to bepreferable in the disclosed system to decompress data in parallel asdisclosed, and to afterwards discard (via unit 86) unneeded decompressedpixel data that is not associated with the predictor block. Thisprocedure advantageously requires significantly less buffer storagecapacity. Thus although the buffer memory bandwidth (data capacity)requirement is reduced, more integrated surface area is needed. However,the use of several decompressors in parallel produces the additionaladvantage of faster operation and associated faster access to the pixeldata needed for motion compensation predictor processing.

The plural decompressors are not pipelined. Each decompressor and itsassociated buffer operates independently to deliver data, so that pixeldata are delivered quickly. Delays in the operation of onedecompressor/buffer network do not affect the operation of otherdecompressor networks. The decompressors also exhibit interleavedoperation with respect to pixel data, which facilitates the independentoperation of each decompressor. Also like the compressor network,decompressors 80, 82 and 84 share a common look-up table (LUT) in unit44.

Various types of compression, including quantization and transformation,may be used by network 40, 42 depending on the requirements of aparticular system. The disclosed system uses fixed-length compression,although variable length compression or adaptive fixed/variablecompression may also be used.

The type of compression used should preferably exhibit certaincharacteristics. Each block should be compressed a predetermined amountso that the location of each compressed block is easily determined. Eachblock should be compressed/decompressed independently of other blocks.Thus any block can be accessed without having to read any other block.The compression/decompression process should not produce objectionableartifacts in a reproduced image. A compression factor of 25% isessentially transparent compared to conventional decoder processingwithout such compression. At 50% compression the results are lesstransparent, but the visible results are acceptable and are notconsidered to be significantly different compared to conventionaldecoder processing without compression and memory reduction.

FIG. 20 illustrates the fixed compression network used in each ofrecompressors 40 and 42 in FIG. 1. The compression network employs adifferential pulse code modulation (DPCM) loop with adaptive prediction.The philosophy of such DPCM processing with prediction is to removemutual redundancy between successive pixels, and produce only differenceinformation. This well-known process is generally described by A. K.Jain in Fundamentals of Digital Image Processing (Prentice-HallInternational), page 483 et seq.

Before discussing the circuit of FIG. 20, reference is made to FIG. 22.FIG. 22 shows an exemplary arrangement of a group of four pixels a, b, cand x (the pixel to be predicted) associated with the predictiveprocessing operation of the DPCM network. This group of pixels isreferenced in the 8×8 pixel block shown in FIG. 22. Each pixel block isscanned in a raster manner as shown in FIG. 22, from left to right in adownward direction, In this example, for luminance information, pixel bis delayed by one pixel interval relative to pixel c, pixel a is delayedby a seven pixel interval relative to pixel b, and pixel x is delayedone pixel interval relative to pixel a. For chrominance information,pixel “a” is delayed by a three pixel interval.

In DPCM predictive processing the current pixel being coded is predictedby using previously coded pixels, which are known to decompressors 62,80, 82 and 84 (FIG. 1). In FIG. 22, where pixel x is the pixel value tobe predictively coded, pixels a, b and c have been predictively codedpreviously and are known to the decompression networks. A prediction ofx, X_(pred), uses the values of a, b and c in accordance with thefollowing pseudo code, which describes the algorithm logic to be used:

if (|a − c| < e₁ && |b − c| > e2), X_(pred) = b else if (|b − c| < e₁ &&|a − c| > e2), X_(pred) = a else X_(pred) = (a + b)/2Values e1 and e2 are constants representing predetermined thresholds.This algorithm is used only for pixels not located in the first row orthe first column of the block being processed. Some exceptions arehandled as follows: the first pixel in a block is coded very finelywithout reference to any other pixel, pixels in the first row use pixelvalue a as the predictor, and pixels in the first column use pixel valueb as the predictor. Basically, this algorithm attempts to detect anedge. In the first case, a vertical edge is suggested between pixels cand b and between pixels a and x. Thus b is the best predictor. Thesecond case suggests a horizontal edge between a and c and between b andx. Thus a is the best predictor. In the third case, no obvious edge isfound. In this case both a and b are equally good predictors, so theiraverage value is used.

The compression network of FIG. 20 quantizes difference (residual) pixelvalues produced as a result of DPCM processing. FIG. 20 uses aninterleaved DPCM loop with two predetermined delays and parallel ratherthan serial processing. The circuit shown in FIG. 20 corresponds to thatemployed by recompressor 40 in FIG. 1 for processing interleaved pixeldata A and C in the sequence shown in FIG. 5. A similar circuit is usedby compressor 42 for compressing interleaved pixel data B and D. Sincethe network of FIG. 20 compresses a residual value, the predictor loopmust finish processing a pixel of a given block before thecorresponding, co-located pixel of the corresponding next block appears.The interleaved pixel block data move independently through the circuit,which is important in a variable length coded system with input andoutput data of different rates.

In FIG. 20, a sequence of interleaved pixel data a, c, a, c, . . . fromrespective interleaved pixel blocks A, C, . . . (FIG. 5) is subjected toa one pixel delay by unit 230. A given pixel value to be compressed isapplied to a non-inverting (+) input of a subtractive combiner 210. Theinverting (−) input of combiner 210 receives predicted pixel values frompredictor 215. The residual (difference) pixel value output fromcombiner 210 is subjected to quantization and inverse quantization byelements 220 and 222 respectively. The quantization provided by element220 is fixed in this example and guarantees a desired fixed amount ofdata compression. Elements 230, 232, 234, 236, 238, 240 and 242 areregisters (e.g., flip-flops) clocked by the 54 MHz CLK signal. Elements230, 232, 240 and 242 (Z⁻¹) exhibit a one clock cycle delay. It takestwo clocks to advance one pixel because of data interleaving. Elements238, 234 and 236 exhibit two, six and eight clock cycle delays,respectively, as a consequence of the network processing a datastream oftwo interleaved pixel blocks. The output of inverse quantizer 222approximates the input to quantizer 220 but differs by a small DC offsetcaused by quantization error. The output of adder 228, Input′, differsfrom the Input signal to combiner 220 by this same amount. The timingrelationship of a sequence of interleaved input pixels a, c . . . overseveral clock cycles, with respect to selected circuit elements of FIG.20, is shown in FIG. 27 and will be discussed in detail subsequently.

The network of FIG. 20 also includes an adder 228 and multiplexers 225and 235 arranged as shown. These multiplexers comprise the pixelprediction network in association with predictor logic 215 and pixeldelay elements 234, 236 and 238. The switching state of Mux 235 isdetermined by luminance (Y) and chrominance (C) control signals appliedthereto. The Y, C control signals are produced as a function of theluminance and chrominance block interleaving as shown in FIG. 3. Theappearance of control signals Y, C results from a pixel counting/timingprocess so that chrominance pixels are processed in sequence aftermacroblock luminance pixels. The Y and C control signals are used tocontrol the amount of delay in the predictor circuit as appropriate for8×8 luminance block processing or 4×8 chrominance block processing.Processing of chrominance pixels is enabled when a “1” logic levelcontrol signal is applied to mux 235, causing mux 235 to pass dataappearing at its “1” input. Processing of luminance pixel data isenabled when a “0” logic level control signal is applied to mux 235,causing mux 235 to pass data applied to its “0” input from the output ofdelay unit 236. In the case of an 8×8 luminance block, the “x” predictorpixel is 8 pixels away. Mux 235 switches input delay paths to producethis greater delay.

Compressed (quantized) residual pixel output data is produced at theoutput of quantizer 220. This compressed data (FIG. 6) is subjected to aone clock cycle delay by unit 242 before being subjected to furtherprocessing including Huffman coding.

Two flip-flop delay elements, 232 and 240, are noted in particular. Theuse of elements 232 and 240 produces dual delay paths Δ1 and Δ2 andpermits the prediction of adjacent pixels rather than every other pixel.Delay path Δ1 comprises circuit components between the output of delay232 and the input of delay 240. Delay path Δ2 comprises circuitcomponents between the output of delay 240 and the input of delay 232.Each of delay units 232 and 240 represents a one clock delay ofapproximately 18 nanoseconds, or one 54 MHz clock cycle. With thisarrangement a compressed output pixel is clocked out of the circuit atthe time a pixel to be compressed is being clocked into the circuit.Thus a compressed output pixel is produced for every input pixel to becompressed, in real time.

In other systems the principles discussed above could be used withfour-times interleaving, i.e., four data pipelines and four instead oftwo delay paths in the system of FIG. 20. Critical processing loops canthen be divided into four parts to facilitate synchronizing, which maypermit the use of a faster clock. Also in this case, a shared look-uptable would conserve integrated chip area. Although the input pixelblocks are interleaved in this example, the input data need not beinterleaved in all systems.

The use of dual delay paths Δ1 and Δ2 facilitates tailoring the overallnetwork delay as needed, e.g., approximately 18 nanoseconds delay inthis case. In this regard it is noted that the extensive signalprocessing associated with each delay path provides various means fortailoring the delays. The delays exhibited by the two delay paths arenot critical. The circuits are preferably optimized around the clock sothat each delay exhibits approximately one clock cycle of the periodic54 MHz clock. However, in other systems it may be appropriate to tailorthe clock cycles with respect to a given circuit, e.g., to produceirregular or non-periodic clock cycles. The two delay paths need notexhibit equal signal processing delays, but approximately equal delaysare preferable in the disclosed system.

Two signal processing paths such as Δ1 and Δ2 can be optimized forsignal processing delay more easily than one overall path such as wouldbe the case in the absence of elements 232 and 240. In the case of twopaths as defined by elements 232 and 240, each path can begin operatingwithout waiting for the results from the other path. In the case of asingle path system, each pixel value (e.g., the pixel value at the nodeat the input to elements 234, 215 and 225) must be processed by severalfunctions, including predictor logic, adder, quantization and inversequantization, and appear at the end of the path before the beginning ofthe next clock cycle. In addition, such pixel value must be stable atsuch time. This is a severe constraint which is not present in thedisclosed multiple path system, which exhibits more freedom.

The arrangement of FIG. 20, when embodied in hardware such as anintegrated circuit, is capable of producing a compressed pixel outputfor every pixel input, in real time at a 54 MHz clock rate. The FIG. 20arrangement affords more freedom to tailor signal processing delays andconsumes significantly less surface area in an integrated device toproduce the same result. Moreover, the reduced surface area exhibitsless capacitance, resulting in faster operating speed capability andless power consumption. The use of a faster clock is also possible. Insuch case interleaving will still produce a benefit in terms of reducedintegrated circuit area (e.g., fewer compression units and associatedsupporting units) and better system optimization using automated designtools.

With one clock all logic gates must be synthesized at one time. The useof two delay paths as discussed greatly simplifies the synthesis oflogic gates for both compressor and decompressor networks when theintegrated circuit design involves the use of VHDL high level languagecode (as known) from which the gates are synthesized. With two delaypaths, the automatic logic design converges quickly so that gates aresynthesized faster, more accurately and more reproducibly.

Besides facilitating a more reproducible design, the described dualprocessing paths in FIG. 20 promote the use of interleaving to produce abandwidth advantage and the use of shared logic elements (e.g., look-uptables). Such dual processing paths also facilitate partitioning thedesign into functional cells or modules as required by a particularsystem, such as prediction and compression calculation functions in thisembodiment. Such modules can be tailored as needed to suit therequirements of a particular system design.

With regard to interleaved compressor operation it is noted that it hasbeen found preferable to use one compressor with interleaved data usingtwo cycles of a given fast clock than to use two compressor circuitseach clocked at half the given clock. Using two cycles of one clockfacilitates timing optimization via interleaving as discussed, andinterleaving allows twice as much data to be processed. In the disclosedsystem, prediction of a given pixel value is performed during one clockcycle while calculations (such as quantization and inverse quantization)for that pixel are performed during the next clock cycle. For example,for interleaved pixel blocks A and C, pixel data from block A ispredicted during one 54 MHz clock cycle while quantization calculationsare being performed on pixel data from block C. During the next clockcycle, block A pixel data is subjected to quantization calculationswhile block C pixel data is being predicted. Thus the system alternatelypredicts and calculates for different interleaved blocks. Using twocycles of the 54 MHz clock affords the opportunity to optimize circuitdelays using appropriate tools available for hardware circuitfabrication. The process of alternatively predicting pixel values andcalculating compression values is illustrated by FIG. 27.

FIG. 27 illustrates the process by which interleaved pixels “a” and “c”of associated interleaved pixel blocks A and C are processed overseveral cycles of the 54 MHz compression clock. Assume processing beginswith first pixel a1 of first pixel block A1. Considering FIG. 27 withFIG. 20, the first clock cycle causes pixel a1 to be clocked from theinput of register (flip-flop) 230 to its output, whereby pixel a1 isquantized (compressed) by unit 220 and inverse quantized by unit 222before appearing at the input of register 232, all within the firstclock cycle. At this point pixel a1 is designated as pixel a1′ becausepixel a1 at this point approximates input pixel a1 but exhibits a smallDC offset due to quantization error associated with processing by units220 and 222.

The second clock cycle causes the next pixel to appear, namely firstinterleaved pixel c1 of interleaved pixel block C1, to be processed in amanner similar to that described above for pixel a1. In addition, thesecond clock cycle causes pixel a1′ to be clocked to the output ofregister 232 and thereby to the prediction network including units 215and 225. This results in a predicted value of pixel a1′ appearing at theinput of register 240. Thus during the second clock cycle pixel c1 iscompressed (quantized) while previously compressed pixel a1′ issubjected to predictive processing.

During the third clock cycle, predictively processed pixel a1′ isconveyed to the output of register 240, subtractively combined in unit210, compressed by unit 220 and appears as compressed output value a1″at the input of output register 242. Pixel a1″ is clocked from thisregister to subsequent Huffman coding circuits on the next clock cycle.Also during the third clock cycle, while compressed pixel value a1″ isbeing produced, interleaved pixel c1′ is being subjected to predictiveprocessing by unit 215. This process continues for the remaining a_(n),c_(n) pixel of interleaved blocks A and C, whereby during each clockcycle interleaved pixels are subjected to prediction and compressionprocessing, respectively.

Without data interleaving, processing would have to progress from pixelvalue a1, for example, to compressed output value a1″ in one clockcycle. This requirement presents a severe speed and timing constraintwhich is avoided by interleaved processing as described. Interleavedprocessing also permits shared quantization and inverse quantizationlogic, resulting in less integrated circuit area and power consumption.

The described interleaved compressor operation can be used independentof MPEG compression, and as such represents an economical processingsystem for use in consumer video applications (e.g., home video systemssuch as VCRs and camcorders) to provide predictive compression of25%–50% where more complex and expensive MPEG compression is notrequired.

Four rather than two interleaved pixel blocks could also be used, with afaster clock. In such case an entire block of four 8×8 luminance pixelblocks could be processed at once.

In the disclosed system each decompressor network is arranged as shownin FIG. 21. The decompressor circuit is similar to the compressorcircuit of FIG. 20 except that element 210 is a 9-bit adder and elements220, 222, 228 and 242 have been removed. Path Δ2 involves lessprocessing than path Δ1. However, even the inclusion of 9-bit adder 231in path Δ2 adds a time constraint of about 9 milliseconds, whichcomplicates the decompressor design. In this regard it is noted thatadder 231 cannot begin computing until mux 225 has received valid data.Thus it is beneficial to reduce loop timing constraints. The use of dualdelay paths accomplishes this as well as greatly simplifying the overalldesign.

For decompression, prediction path Δ1 has been found to be the moreimportant path. Path Δ2 has been found to be more important in the caseof compression, where intensive data manipulations dictate the use of aslower 54 MHz clock.

As noted previously, each decompressor in network 80–84 operatesindependently so that pixel data are delivered quickly. The transfer ofdata is often accomplished by means of clocked devices, such asflip-flops or registers. When the data to be pipelined are derived fromtwo or more sources, e.g., in the case of interleaved data, at anymoment in time the data in some registers is from one source while datain other registers is from another source. The data flow together inresponse to a common data clock, but the data in successive registersare mutually independent. System operating problems can be avoided whenboth data sources are started and stopped synchronously as long as thedatastream (pipeline) is started and stopped at the same time.

A problem occurs when one source stops sending data while the othersource continues to send data. In data intensive pipelines such as inHDTV signal processors, the large number of calculations/second arecritical to producing an accurate, high quality image for display. Suchsystems cannot afford to interrupt the dataflow whenever one or two (ormore) data sources stop sending data. In such cases it is important tocontrol the pipeline dataflow so that proper phasing of output dataprovided from an uninterrupted source is maintained when the other datasource is interrupted.

It is important that the data clock not be stopped in a data intensiveimage processing system such as an HDTV system. In such systems certaincomponents such as compression and decompression subsystems havedifferent input and output data processing requirements and differentdata rates. For example, decompressed output pixel data for display mustbe output continuously, which requires a continuous clock, butcompressed input data to be decompressed may arrive sporadically withnull intervals when a clock is not present. If the decompression clockwere stopped when input data is absent, clocking out of decompressedpixel data would also stop. This would be disruptive in a data intensivehigh definition image processing and display system. Thus repeating datais advantageous under certain conditions as will be discussed,particularly when Huffman decoding is employed in the decompressionprocess.

In the disclosed system, the output of the compression network (FIG. 20)is subjected to Huffman coding. Huffman decoding is associated withdecompression at a decoder. Since Huffman coding/decoding is astatistical process with different input and output data rates due todifferent coded word lengths, buffers are used to accommodate variabledata content.

As will be seen with respect to FIGS. 23 and 24, when data from separatesources are processed by a pipelined sequence of registers, feedbackfrom every other register is used to keep one data component (from afirst source) flowing through the pipeline, while the other datacomponent (from a second source) is kept repeating upon itself. Withthis technique, with interleaved data from sources, data can beprocessed through the pipeline at a desired, predicted rate when one ofthe data sources has stopped providing data.

Repeating data is equivalent to stopping the data clock but withoutstart-stop synchronization problems. The use of repeating data ispreferred to using no data (e.g., null words) since data cycles would belost in recovering delays. Repeating data is important to maintainingthe integrity of data flow and is not as disruptive as sending no data.

The data repeating process can repeat data for the duration of an 8×8pixel block (64 clock cycles) without introducing system complications.Longer repeating delays are also possible depending on the nature of thesystem and associated processing. For example, in the course of pixelprediction processing, up to six blocks will be stored in memory. Insuch case, one block can effectively be held in place (repeated) in thepresence of a source disruption while other blocks are being acquiredfrom memory. It is expected that repeat delays over 1 or 2 macroblockintervals can be tolerated.

Repeating data is preferable to adding null data when a sourcedisruption occurs because processing null data is less efficient. Likeother data, null data is stored in memory, and clock cycles are wastedrecovering from a null data condition, e.g., reloading valid data afternulls are removed. This is an important consideration in a dataintensive system such as a high definition television system, wherememory bandwidth is very important and the number of clock cyclesrequired for data processing should be reduced as much as possible.

In FIG. 23 the input pipes respectively convey data X and Y fromseparate data sources, e.g., from separate locations in a memory such asframe memory 60 in FIG. 1. Data X and Y are mutually independent and arein no particular order, i.e., they may or may not be interleaved, andrepresent any pixel data requested from memory. In this example the Xand Y data respectively represent forward and backward motioninformation for use by the motion compensation network (FIG. 1). The Xdata must be processed even if Y data is not present, and vice-versa.

The circuit of FIG. 23 conveys data from memory 60 to decompressornetwork 80–84 in FIG. 1, and is well-suited to MPEG processing. An MPEGcoded P or B picture could be produced, but it may happen that a givenmacroblock may not have one or the other of forward or backward data forprocessing. The arrangement of FIG. 23 recognizes this possibility.

Input FIFO buffers 332 and 334 are associated with each input for ablock of data. In this example buffers 332 and 334 represent the inputbuffers for each decompressor 80, 82 and 84 in FIG. 1. Each buffer sendsa signal Req requesting data from memory via memory controller 134 atappropriate times (e.g., in response to processing involving singleversus dual direction predictions), and receives a return acknowledgmentsignal Ackn that data is available to be sent. The flow of data betweenthe memory sources and the buffers is controlled by data Read/Writecontrol signals as known.

The input X, Y data is multiplexed onto a common data line by means of aMux 336 in response to a CLK/2 data clock, producing a pipeline ofalternating X, Y data at the output of Mux 336. Data from Mux 336 isprocessed by a series of feedback register sections 360 and 364. Thenumber of sections used is a function of the number of interleaved dataelements, two in this case. Section 360 includes an input multiplexer338 and cascaded registers (flip-flops) 340 and 342 arranged as shown.Each register element is clocked at 81 MHz by the CLK signal. Section364 is arranged similarly. The output of the last register element 350is applied to the data decompression network of FIG. 1, whichdecompresses data including forward and backward motion predictioninformation. Data must be decompressed as soon as it is received by thedecompressor. The decompressor cannot wait until X, Y buffers 332, 334are filled. In each register section feedback is provided from theoutput of the last register in that section to a switching control inputof the associated multiplexer, e.g., from the output of register 342 tothe “1” input of Mux 338. The network constituted by feedback registersections 360 and 364 operates as a selective digital sample and holdnetwork with two operating modes. In one mode data is sampled and heldto produce data repeat operation. In another mode data is transmittednormally, without repetition.

Unit 356, e.g., a digital comparator, senses the state of the Req andAckn signal lines. If a FIFO buffer generates a Req signal and a returnAckn is not received from the memory source, unit 356 generates a dataHalt signal at a “1” level, or state. Data flows normally through thepipeline when the Halt signal exhibits a “0” state, but data arerepeated as explained below when the Halt signal exhibits a “1” state.When an Ackn signal is not received from a given input, the Halt signalcauses the last valid data component to be repeated, or recirculated, ineach register section. This is illustrated by the waveforms of FIG. 24as will be discussed. If an Ackn signal is not received from both X andY input data sources, the clock is stopped and no data is recirculated.

Thus when the Halt signal exhibits a 0 level such that data flowsnormally through the pipeline, input data X and Y are maintained in theproper interleaved (clock) phase relationship so that clocking causesoutput data to alternate between source X data and source Y data. Thisphase relationship is important to prevent mixing data. In this case theoutput data of each register section (e.g., at the output of registers342 and 350) corresponds to the input data two clocks earlier (e.g.,Output=Input (Z⁻²)). When the Halt signal exhibits a 1 level, theassociated Mux (338 or 344) decouples the input signal from the outputso that each register section simply recirculates data. These operatingconditions are illustrated by FIGS. 25 and 26 respectively.

FIG. 24 illustrates a condition where, for example, the Halt signalexhibits a 1 level only when certain data from the source of Y data havestopped. While the Halt signal is active, the Y data is recirculated(repeated) until the Halt signal returns to a normal 0 level and Y dataflows again. During this time data from source X flows withoutinterruption. In FIG. 24 the Input waveform contains an interleavedsequence of X and Y data components. In this example an Ackn signal hasnot been received for the Y2 component following component X2. Thus thenormally “0” state of the Halt signal from unit 356 changes to a “1”state, causing each register section 360 and 364 to repeat the lastvalid Y component, in this case Y1, as long as the Halt signal exhibitsthe “1” state. The Halt signal is coupled to a control input ofmultiplexers 238 and 244 such that a “1” state of the Halt signal causeseach multiplexer to convey the signal coupled to its “1” switchinginput, in this case the Y data components.

The shaded components of the Input signal waveform represent the missingY2 component, i.e., no Y component is being issued by the second sourceafter component Y1. The Y1 component is repeated for three Req/Ackncycles, whereby three Halt signals are generated and component Y1 isrepeated three times, as shown in the Output waveform of FIG. 24.Afterwards, the second source generates an Ackn signal for component Y2,which appears in the Output waveform sequence following data componentX5.

The Halt signal is also provided to a control input of the associateddecompressor network for instructing the decompressor to ignore therepeated data in the datastream. As mentioned previously, theinterleaved X, Y data components are independent and need not followeach other in any particular (numerical) sequence. It is only necessarythat data associated with a given input follow a prescribed sequence,e.g., X5 follows X4, which follows X3, which follows X2 and so on. It isof no consequence that, for example, Y2 follows X5.

FIG. 28 depicts the network of FIG. 23 arranged for parallel operation.Interleaved input data from Mux 336 (FIG. 23 is provided viamultiplexers 285 and 286 to parallel registers 280 and 282 Outputs fromthese registers are multiplexed onto a data output path via Mux 284. Theoperation of multiplexers 284, 285 and 286 is controlled by Halt 1 andHalt 2 control signals which are associated with respective sources andproduce an effect as described in connection with FIG. 23.

1. An MPEG compatible digital signal processing system comprising: an input network for receiving a data stream of MPEG coded data; a coupling network responsive to said datastream for deriving therefrom a predetermined sequence of image data; and an image signal processor responsive to said image data wherein said coupling network comprises interleaving means responsive to said datastream of MPEG coded data for deriving therefrom at least first and second datastreams, said first datastream being constituted by a first predetermined sequence of interleaved first and second spatially adjacent pixel block components and said second datastream being constituted by a second predetermined sequence of interleaved third and fourth spatially adjacent pixel block components for producing decoded image information selectable for producing either high resolution or reduced data image reproduction of a complete image.
 2. A system according to claim 1, wherein said interleaved image data comprises data block components of an MPEG compatible macroblock containing pixel representative information.
 3. A system according to claim 1, wherein: said interleaving means produces a first datastream of interleaved first and second spatially adjacent pixel block components from each macroblock of said MPEG coded data and a second datastream of interleaved third and fourth spatially adjacent pixel block components from each macroblock of said MPEG coded data.
 4. A system according to claim 3, wherein said first, second, third and fourth pixel block components are spatially adjacent components of an MPEG compatible macroblock.
 5. A system according to claim 1, wherein said input network includes a decoder for decoding said MPEG coded datastream; and a decompressor for decompressing output signals from said decoder; wherein said interleaving network responds to output signals from said decompressor.
 6. A system according to claim 1 and further including a memory for storing image representative data; and a motion compensation network coupled to said memory; wherein said image signal processor and said motion compensation network comprise a DPCM loop.
 7. A method for processing a datastream of MPEG coded image representative data, comprising the steps of: decoding said data to produce a decoded datastream; producing from said decoded datastream a predetermined sequence of interleaved data blocks representing image pixels; processing said data blocks; and storing data blocks from said processing step; wherein said producing step comprises producing multiple datastreams, each datastream having a different predetermined sequence of mutually interleaved pixel block components selectable for either high resolution or reduced resolution data image reproduction modes for a complete image.
 8. A method according to claim 7, wherein said producing step produces a first datastream of interleaved spatially adjacent first and second pixel block components, and a second datastream of interleaved spatially adjacent third and fourth pixel block components.
 9. A method according to claim 8, wherein said interleaved pixel blocks comprise an MPEG compatible macroblock.
 10. A method according to claim 7, wherein said processing step includes DPCM processing of pixel data.
 11. A method according to claim 10, wherein said DPCM processing step includes the further steps of decompressing data blocks stored in said storing step; and motion compensation processing decompressed data blocks produced by said decompressing step.
 12. A method according to claim 9, wherein said processing step comprises the steps of predicting pixel values and compressing pixel values.
 13. A method for processing a datastream of MPEG coded image representative data, comprising the steps of: receiving an input datastream of MPEG coded data; decoding said input datastream to produce a decoded datastream of data blocks containing pixel representative information; processing said decoded datastream of datablocks to produce therefrom a first datastream comprising at least first and second groups of data block components having pixel representative information interleaved in a first predetermined sequence, and a second datastream comprising at least third and fourth groups of data block components having pixel representative information interleaved in a second predetermined sequence; and decoding said first and second datastreams to produce decoded image information selectable for reproducing complete images in either high resolution or reduced resolution image reprduction modes.
 14. A method according to claim 13, wherein said first group is constituted by first and second pixel blocks of an MPEG compatible macroblock; and said second group is constituted by third and fourth pixel blocks of an MPEG compatible macroblock.
 15. A method according to claim 14, wherein said first, second, third and fourth groups comprise the same macroblock. 